BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval.

We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark.

BM25がベースライン

https://github.com/beir-cellar/beir